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^ | ' Abstract 

' This article studies the limiting behavior of a class of robust population covariance matrix estimators, originally 



X 



due to Maronna in 1976, in the regime where both the number of available samples and the population size grow 
large. Using tools from random matrix theory, we prove that the difference between the sample covariance matrix and 



> 

(a scaled version of) such robust estimator tends to zero in spectral norm, almost surely, this being valid irrespective 
of the distribution of the population vector entries. This result is applied to prove that recent subspace methods arising 
1^-} . from random matrix theory can be made robust without altering their first order behavior. 

o 

C\| ' I. Introduction 



Many multi-variate signal processing detection and estimation techniques are based on the empirical covariance 
matrix of a sequence of samples x\,...,x n from a random population vector x £ C . Assuming E[x] =0 and 
E[xx*] = Cjv, the strong law of large numbers ensures that, for independent and identically distributed (i.i.d.) 
samples, 
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almost surely (a.s.), as the number n of samples increases. Many subspace methods, such as the multiple signal 
classifier (MUSIC) algorithm and its derivatives [1], [2], heavily rely on this property by identifying Cn with Sm, 
leading to appropriate approximations of functionals of Cn in the large n regime. However, this standard approach 
has two major limitations: the inherent inadequacy to small sample sizes (when n is not too large compared to N) 
and the lack of robustness to outliers or heavy-tailed distribution of x. Although the former issue was probably the 
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first historically recognized, it is only recently that significant advances have been made using random matrix theory 
[3]. As for the latter, it has spurred a strong wave of interest in the seventies, starting with the works from Huber 
[4] on robust M-estimation. The objective of this article is to provide a first bridge between the two disciplines 
by introducing new fundamental results on robust M-estimates in the random matrix regime where both N and n 
grow large at the same rate. 

Aside from its obvious simplicity of analysis, the sample covariance matrix (SCM) Sn is an object of primal 
interest since it is the maximum likelihood estimator of Cat for x Gaussian. When x is not Gaussian, the SCM 
as an approximation of Cn may however perform very poorly. This problem was identified in multiple areas such 
as multivariate signal processing or financial asset management, but was particularly recognized in adaptive radar 
and sonar processing where the signals under study are characterized by impulsive noise and outlying data. Robust 
estimation theory aims at tackling this problem [5]. Among other solutions, the so-called robust M-estimators of the 
population covariance matrix, originally introduced by Huber [4] and investigated in the seminal work of Maronna 
[6], have imposed themselves as an appealing alternative to the SCM. This estimator, which we denote Cn, is 
defined implicitly as a solution of 1 

Cn = ~ U \ N X i C N lx iJ x i x i W 

for u a nonnegative function with specific properties. These estimators are particularly appropriate as they are the 
maximum likelihood estimates of Cn for specific distributions of x and some specific choices of u, such as the 
family of elliptical distributions [7]. For any such u, Cn is, up to a scalar, a consistent estimate for Cn for 7Y 
fixed and n — > oo. The robust estimators are also used to cope with distributions of x with heavy tails or showing 
a tendency to produce outliers, such as when ||x|| 2 has a K-distribution often met in the context of adaptive radar 
processing with impulsive clutter [8]. 

A second angle of improvement of subspace methods has recently emerged due to advances in random matrix 
theory. The latter aims at studying the statistical properties of matrices in the regime where both N and n grow 
large. It is known in particular that, if x = A N y with y e C M , M > N, a vector of independent entries with 
zero mean and unit variance, then, under some conditions on Cn — AnA* n and y, in the large N, n (and M) 
regime, the eigenvalue distribution of (almost every) Sn converges weakly to a limiting distribution described 
implicitly by its Stieltjes transform [9]. When Cn is the identity matrix for all 7Y, this distribution takes an explicit 
form known as the Marcenko-Pastur law [10]. Under some additional moment conditions on the entries of y, it 
has also been shown that the eigenvalues of Sn cannot lie infinitely often away from the support of the limiting 
distribution [11]. In the past ten years, these two results and subsequent works have been applied to revisit classical 
signal processing techniques such as signal detection schemes [12] or subspace methods [13], [14]. In these works, 
traditional n-consistent detection and estimation methods were improved into (N, n)-consistent approaches, i.e. they 

'Our expression differs from the standard convention where x*CJj Xi is traditionally not scaled by 1/iV. The current form is however more 
convenient for analysis in the large N, n regime. 
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provide estimates that are consistent in the large N, n regime rather than in the fixed N and large n regime. These 
improved estimators are often referred to as G-estimators. 

In this article, we study the asymptotic first order properties of the robust M-estimate Cn of Cn, given by (1), 
in the regime where N, n (and M) grow large simultaneously, hereafter referred to as the random matrix regime. 
Although the study of the SCM Sn for vectors x with rather general distributions is accessible to random matrix 
theory, as in e.g. the case of elliptical distributions [15], the equivalent analysis for Cn is often very challenging. 
In the present article, we restrict ourselves to vectors x of the type x = A N y with y having independent zero-mean 
entries. One important technical challenge brought by the matrix Cn, usually not met in random matrix theory, lies 
in the dependence structure between the vectors {u^x^C^x^ixi}"^ (as opposed to the independent vectors 
{^ijiLi for the matrix Sn). We fundamentally rely on the set of assumptions on the function u taken by Maronna 
in [6] to overcome this difficulty. Our main contribution consists in showing that, in the large N, n regime, and 
under some mild assumptions, \\Cn — o:Sn\\ — > 0, a.s., for some constant a > dependent only on u. This result 
is in particular in line with the conjecture made in [16] according to which \\Cn — chSn\\ —> for the function 
u(s) = 1/s studied extensively by Tyler [17], [18]; however, the function u(s) — 1/s does not enter our present 
scheme as it creates additional difficulties which leave the conjecture open. 

A major practical consequence of our result is that the matrix Sn, at the core of many random matrix-based 
estimators, can be straightforwardly replaced by Cn without altering the first order properties of these estimators. 
We generically call the induced estimators robust G-estimators. As an application example, we provide in this 
article a robust direction-of-arrival estimator, referred to as robust G-MUSIC, based on the G-MUSIC estimator 
from Mestre [19]. 

The remainder of the article is structured as follows. Section II provides our theoretical results. Section III 
introduces the robust G-MUSIC estimator. Section IV then concludes the article. All technical proofs are detailed 
in the appendices. 

Notations: The arrow '-^y denotes almost sure convergence. For A e C NxN Hermitian, Xi(A) < ... < \n{A) 
are its ordered eigenvalues. The norm || • || is the spectral norm for matrices and the Euclidean norm for vectors. 
For A, B Hermitian, A > B means that A — B is nonnegative definite. The notation A* denotes the Hermitian 
transpose of A. We also write i = ^/~A- 

II. Main results 

Let X = [xi, . . . ,x n ] e C Nxn , where x; t = A N Vi € C N , with y; t = [y a , . . . , y t M] T € C M having independent 
entries with zero mean and unit variance, A N e C NxM , and Cn — A N A* N e £ NxN be a positive definite matrix. 
We denote cat = N/n, cn — M/N, and define the sample covariance matrix Sn of the sequence xi, . . . , x n by 




Let u : M + -4- R+ (R + = [0, oo)) be a function fulfilling the following conditions: 
(i) u is nonnegative, nonincreasing, and continuous on M+; 
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(ii) the function <j> : M + — » M + , s ^ su(s) is nondecreasing and bounded, with sup^, </>(x) = </>oo > 1. Moreover, 
is increasing in the interval where <p(s) < (fy^. 

Classical M-estimators Cn defined by (1) for such function u include the Huber estimator, with (f>(s) = 
for s G [0, 0oo — 1], > 1, and 4>(s) — (p^ for s > (f)^ — 1. Since u(s) is constant for s < (f)^ — 1 and decreases 
for s > 4>oo — 1, this estimator weights the majority of the samples x\, . . . , x n by a common factor and reduces the 
impact of the outliers. The widely used function u(s) = (1 + t)(t + x)^ 1 for some t > shows similar properties, 
here with ^ = 1 + 1. In either scenario, robustness can be controlled by properly setting 0oo. 

To pursue, we need the following statistical assumptions on the large dimensional random matrices under study. 

Al. The random variables yij, i < n, j < M, are independent either real or circularly symmetric complex 
(i.e. E[yfj] = 0) with E[j/jj] = and E[|j/jj | 2 ] = 1. Also, there exists 77 > and a > 0, such that, for all i, j, 

E[|^f+"]<«. 

A2. > 1 and, as n — > 00, 

< liminfcjv <limsupcAr < 1, limsupc„ < 00. 

n n n 

A3. There exists C_ , C + > such that 

C_ < liminf{Ai(CAr)} < limsup{Ajv(Cjv)} < C+. 

N N 

Note that the assumptions neither request the entries of y to be identically distributed nor impose the existence 
of a continuous density. This assumption is adequate for a large range of application scenarios such as factor 
models in finance or general signal processing models with independent entry-wise non-Gaussian noise, although 
the requirement of independence in the entries of y is somewhat uncommon in the classical applications of robust 
estimation theory. The entry-wise independence is however central in this article for the emergence of a concentration 
of the quadratic forms j^x^C^Xi, i = 1, ...,n. Further generalizations, e.g. to elliptical distributions for x, 
would break this effect and would certainly entail a much different asymptotic behavior of Cn- These important 
considerations are left to future work. 

Technically, A1-A3 mainly ensure that the eigenvalues of Sn and Cn lie within a compact set away from zero, 
a.s., for all N, n large, which is a consequence (although non immediate) of [11], [14]. Note also that A2 demands 
lim'mi n cn > 0, so that the following results do not contain the results from [6], [18], in which TV is fixed and 
n — > 00, as special cases. With these assumptions, we are now in position to provide the main technical result of 
this article. 

Theorem 1: Assume A1-A3 and consider the following matrix-valued fixed-point equation in Z e C NxN , 

Z=^Y j u(^x*Z- 1 x^\x l xt (2) 

Then, we have the following results. 
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(I) There exists a unique solution to (2) for all large N a.s. We denote Cn this solution, defined as 

C N = lim Z (t) 

where Z<°) = I N and, for t € N, 



2 — 1 ^ ' 



(II) Defining Cn — In when (2) does not have a unique solution, we also have 



1 (l)C l Af — Sn 
Proof: The proof is provided in Appendix A. 



0. 



An immediate corollary of Theorem 1 is the asymptotic closeness of the ordered eigenvalues of <j> 1 (1)Cn and 
Sn- 



Corollary 1: Under the assumptions of Theorem 1, 



max 

i<N 



a.s. 



0. 



^(^XiiC^-XiiSN) 

Proof: The proof is provided in Appendix A. ■ 

Some comments are called for to understand Theorem 1 in the context of robust M-estimation. 

Theorem 1-(I) can be first compared to the result from Maronna [6, Theorem 1] which states that a solution 
to (2) exists for each set {xi, . . . ,x n } under certain conditions on the dimension of the space spanned by the n 
vectors, as well as on u(s), N, and n (in particular u(s) must satisfy ^ > n/(n — N) in [6]). Our result may 
be considered more interesting in practice in the sense that the system sizes N and n no longer condition (j)^ and 
therefore do not constrain the definition of u{s). Theorem 1-(I) can also be compared to the results on uniqueness 
[6], [18] which hold for all N, n under some further conditions on u(s), such as <p(s) is strictly increasing [6]. The 
latter assumption is particularly demanding as it may reject some M-estimators such as the Huber M-estimator for 
which (f)(s) is constant for large s. Theorem 1-(I) trades these assumptions against a requirement for N and n to be 
"sufficiently large" and for {xi, . . . ,x n } to belong to a probability one sequence. Precisely, we demand that there 
exists an integer n depending on the random sequence {(x\, . . . , x n )}^ =1 , such that for all n > n , existence and 
uniqueness are established under no further condition than the definition (i)-(ii) of u(s) and A1-A3. 

Theorem 1— (II), which is our main result, states that, as N and n grow large with a non trivial limiting ratio, 
the fixed-point solution Cn (either always defined under the assumptions of [6], [18] or defined a.s. for large 
enough N) is getting asymptotically close to the sample covariance matrix, up to a scaling factor. This implies in 
particular that, while Cn is an n-consistent estimator of (a scaled version of) Cn for n — > oo and N fixed, in the 
large N, n regime it has many of the same first order statistics as Sn- This suggests that many results holding for 
Sn in the large N, n regime should also hold for Cn, at least concerning first order convergence. The estimators 
Cn, parametrizable through u, should then be seen as a class of alternatives for Sn- Note also that Theorem 1 is 
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independent of the choice of the distribution of the entries of y or of the choice of the function u, which is similar 
to the equivalent result in the classical fixed- N large-n regime. 

In terms of applications to signal processing, recall first that the n-consistency results on robust estimation [6], 
[18] imply that many metrics based on functionals of Cn can be consistently estimated by replacing Cn by Cn- 
The inconsistency of the sample covariance matrix to the population covariance in the random matrix regime, along 
with Theorem 1, suggest instead that this approach will lead in general to inconsistent estimators in the large N, n 
regime, and therefore to inaccurate estimates for moderate values of N, n, M. However, any metric based on Cn, 
and for which an (N, n)-consistent estimator involving Sn exists, is very likely to be (iV, n)-consistently estimated 
by replacing Sn by ^ _1 (1)Cjv. The interest of this replacement obviously lies in the possibility to improve the 
metric through an appropriate choice of u, in particular when y exhibits outlier behavior or has heavy tails. In the 
following section, we give a concrete example in the context of MUSIC-like estimation in array processing [19]. 

Remark 1: In a similar context, it is shown in [11] and [20] that the eigenvalues of Sn are asymptotically 
contained in the support of their limiting compactly supported distribution if and only if the entries of y have 
finite fourth order moment. This first suggests that the technical assumption Al which requires y to have uniformly 
bounded 8 + rj moment may be relaxed to y^ having only finite fourth order moments for Theorem 1 to hold. 
This being said, since most of the aforementioned (N, n) -consistent estimators involving Cn or Sn rely on a non- 
degenerate behavior of these eigenvalues (see e.g. [21, Chapters 16-17] for details), the finite fourth order moment 
condition cannot possibly be further relaxed for these estimators to be usable. As a consequence, although Al might 
seem very restrictive in a robust estimation framework as it discards the possibility to consider distributions of x 
with heavy tail behavior, it is a close to necessary condition for robust estimation in the random matrix regime to 
be meaningful. 

III. Application: Robust G-MUSIC 

Consider K signal sources impinging on a collection of N collocated sensors with angles of arrival Oi,.. .,0k- 
The data x t G C N received at time t at the array is modeled as 

K 

x t = y^ j ifpks(9 k )zk,t + crw t 
fe=i 

where s(9) e C N is the deterministic unit norm steering vector for signals impinging the sensors at angle 9, Zk,t € C 
is the signal source modeled as a zero mean, unit variance, and finite 8 + 77 order moment random variable, i.i.d. 
across t and independent across k, pt > is the transmit power of source k (pk < Pmax for some p max > 0) and 
awt <G C N is the received noise at time t, independent across t, with i.i.d. zero mean, variance a 2 > 0, and finite 
8 + t] order moment entries. 
We can write 



x t = A N y t , A 



N 



s(e)P2 <ri N 



where 5(9) = [s(0i), . . . , s{0 K )], P = diag(pi, . . . ,p K ), and y t = (z M , . . . , z K ,t, wJ) T e C N+K . 
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Taking n independent observations x\,...,x n of the process x t and assuming n,N, and M = N + K large 
accordingly to Assumption A2, Assumptions A1-A3 are met and Theorem 1 can be applied. This yields the 
following result. 

Theorem 2 (Robust G '-MUSIC): Under the current model, denote E w € £ Nx ( N - K ) a matrix containing in 
columns the eigenvectors of Cn with eigenvalue a 2 . Also denote Bk the eigenvector of Cn with eigenvalue \\ = 
Afc(CV) (recall that Ai < . . . < Xn), with Cn defined as in Theorem 1 (with Cn = In when (2) does not have a 
unique solution). Then, as N, n — > oo in the regime of Assumption A2, and K fixed, 



where 



7 (0) - 7(0) ^ 



7 (0) = s(0)*£V£^s(0) 
7 (0)=^/3 iS (0)*e i e|s(0) 



and 



1 + >^ N ~ K 

with j&i < . . . < /tjv the eigenvalues of diag(A) — ^VaVA , A = (Ai, . . . , Ajy) T . 

Proof: The proof can be found in Appendix E. ■ 

The function 7(0) is the defining metric for the MUSIC algorithm [1], the zeros of which contain the 0^, 
i € {1, . . . , K}. Theorem 2 proves that the N, n-consistent G-MUSIC estimator of 7(0) proposed by Mestre in 
[13] can be extended into a robust G-MUSIC method. The latter consists in replacing the sample covariance matrix 
Sn as in [13] by the robust estimator Cn- The angles 0j are then estimated as the deepest minima of 7(0). This 
new technique is expected to perform better than either MUSIC or G-MUSIC in the finite (N, n) regime in the 
case of non-Gaussian noise, for an appropriate choice of the function u. Proving so requires the study of the second 
order statistics of 7(0), which is left to future work. Note also that our result does not prove the N, n-consistency 
in the estimates of 81, ... , 9k, which would demand to show 

sup | 7 (0) - 7(0)| ^>0. 

0e[-7i\7r] 

Proving this convergence requires more advanced results; see [22, Section 4.3.2] for a discussion on this topic. 

In the following, we provide comparative performance results between the classical MUSIC, the robust MUSIC, 
the G-MUSIC, and the robust G-MUSIC algorithms. We recall that the MUSIC algorithm consists in determining 
the deepest local minima of the function 
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Fig. 1. MSE performance of the various MUSIC estimators for K = 1, Gaussian noise, N = 10, and n = 50. 

where if is the eigenvector associated with the i-th smallest eigenvalue of Sn (the notation oo recalls the fact 
that moo for N fixed in this setting). Robust MUSIC is equivalent to MUSIC but uses ej instead of ef in the 
expression of j°°(8). G-MUSIC determines the local minima of *f(0) but with ef instead of e,. Finally, robust 
G-MUSIC seeks the minima of j(0), as described in Theorem 2. 

We take Zk,t standard Gaussian, independent across k and t, and wt a vector with independent zero-mean unit 
variance entries wu, such that \wn\ 2 has a K-distribution with shape parameter v and scale parameter 1/v, i.e. 

where wu ~ CX(0, 1) and gu ~ v^ 1 ) are independent random variables. Note that the entries of wt have 
moments of all orders, with in particular a fourth order moment given by 

E[K t | 4 ] = 2^(l + u). 

The case where w t has purely Gaussian entries, i.e. when v — > oo, will be used as a reference scenario. When v is 
large, wt models the realistic scenario of a sensor array with close-to-Gaussian noise. For v small, and in particular 
for v < 1 (resulting into a noise distribution with large high order moments), the scenario can be either used to reflect 
independent antenna reading errors in a sensor array or to model a distributed sensor network in which each sensor 
faces independent impulsive noise (e.g. in a MIMO-STAP setting [23], [24]). We choose u(s) = (1 + v')/{v' + s), 
for some v 1 > which controls the degree of robustness of the estimator (z/ — > oo brings u(s) — 1, hence reduced 
robustness). We set here u' — 2.5 in all simulations. We model the steering vectors by [s(0)]fc = exp(i7rfc sin(0)) 
as in a uniform linear array of N elements with half wavelength inter-element spacing. We take N = 10, n = 50, 
and pk = 1 for all k. Under these conditions, Cn satisfies [6, Assumption (E)] for v 1 > 2.5, implying that Cn is 
well defined for each x\,...,x n and not only for all large n a.s. 
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Fig. 2. MSE performance of the various MUSIC estimators for K = 1, noise with v = 0.1, TV = 10, and n = 50. 
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Fig. 3. MSE performance of the various MUSIC estimators for K = 1, noise with v = 0.02, N = 10, and n = 50. 



We first consider K = 1 with 9\ = 18°. Figure 1, Figure 2, and Figure 3 depict the mean-square error (MSE) 
performance E[|t(6*i) — 7($i)| 2 ] of the above estimators, as a function of the signal-to-noise ratio (SNR) <j~ 2 . 
In Figure 1, we take Wt Gaussian, i.e. with v — > oo for which E[|?«it| 4 ] = 2. In Figure 2, we set v = 0.1, 
implying E[|u>it| 4 ] = 22, and call this case the intermediate noise scenario. Finally, in Figure 3, we take v = 0.02, 
and then E[|wi t | 4 ] = 102, and refer to this case as the impulsive noise scenario. We naturally expect the robust 
techniques to bring larger performance gains in the latter scenario than in the Gaussian and intermediate ones. The 
simulations are based on 50 000 Monte Carlo simulations per SNR value. We first observe that both robust methods 
perform very similar to their non-robust counterparts in a Gaussian noise setting. In the intermediate noise setting, 
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Fig. 4. Resolution performance of the various MUSIC estimators, 9\ = 10°, 82 = 15°, noise with u = 0.1, N = 10, and n = 50. 



the robust approaches then overcome the non-robust ones, especially in the low-to-medium SNR region where we 
see a significant performance advantage for robust G-MUSIC methods against G-MUSIC. In the impulsive noise 
scenario, we then see both robust methods show a large gain compared to the non-robust ones. In this regime, 
the random matrix advantage of G-MUSIC versus MUSIC disappears completely, while being largely favorable to 
the robust scheme. The latter two results translate the fact that, if the noise non-Gaussianity and the small sample 
size are not both appropriately controlled, one of the two will overtake the other, making G-MUSIC or robust 
MUSIC inefficient. On the contrary, robust G-MUSIC, which controls both problems, always brings a significant 
performance advantage. 

In Figure 4, we depict the performance of resolution of two close sources of the MUSIC estimators. For this, we 
take K = 2, 9% = 10°, 8 2 = 15°, and v = 0.1, i.e. E[|w it | 4 ] = 22. The curves show the probability of detecting 
exactly two local minima of 7 (or 7 00 ) within [5°, 20°], based on 50 000 Monte Carlo simulations for each SNR 
value. In this close-to-Gaussian noise setting, the robust G-MUSIC algorithm shows a much stronger resolution 
power than the G-MUSIC, although both operate at similar MSE. 

The robust G-MUSIC example is an illustrative application of Theorem 1 demonstrating the strong advantage 
brought by a joint robust and random matrix-based signal processing framework. The theoretical performance gains 
are however not easy to obtain as they would require the elaboration of central limit theorems (CLT). In the robust 
G-MUSIC example, this demands a CLT for the quantity n(8 — 8), which requires more advanced tools than these 
presented in this article. 
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IV. Conclusion 

We have proved that a large family of robust estimates of population covariance matrices is consistent with the 
sample covariance matrix in the regime of both large population N and sample n sizes, this being valid irrespective 
of the sample distribution. This result opens up a new area of research for robust estimators in the random matrix 
regime. We applied the robust estimate consistency to prove that a robust version of the G-MUSIC estimator due 
to Mestre is still an N, n-consistent estimator of the direction of arrival in array processing. The simulation results 
suggested that the induced robust G-estimator performs better than the MUSIC and G-MUSIC estimators under 
non-Gaussian noise and for N not small compared to n. Future work will tend to prove this claim via the study of 
second order statistics. 

Appendix A 
Proof of Theorem 1 and Corollary 1 

Proof of Theorem 1: In order to prove the existence and uniqueness of a solution to (2) for all large n, we 
use the framework of standard interference functions from [25]. 

Definition 1: A function h = (hi, . . . , h n ) : K" — > R™ is said to be a standard interference function if it fulfills 
the following conditions: 

1) Positivity: if q\, . . . , q n > 0, then hj(q\, . . . ,q n ) > 0, for all j. 

2) Monotonicity: if qi > q[, . . . , q n > q' n , then for all j, hj(qi, ...,q n )> hj(q[, . . . , q' n ). 

3) Scalability: for all a > 1 and for all j, ahj(qi, . . . , q n ) > hj(aq\, . . . , aq n )- 

Theorem 3: If an n-variate function h(q\, . . . , q n ) is a standard interference function and there exists (qi, . . . , q n ) 
such that for all j, qj > hj(q\, . . . , q n ), then the system of equations 

q 3 = hj(qi,...,q n ) (3) 
for j = l,...,n, has at least one solution, given by lim^oo (q± ^ , . . . , q^), where 

for t > 1 and any initial values gf \ . . . , q^ > 0. 

Proof: The proof is provided in Appendix D. ■ 
Remark 2: Note that our definition of a standard interference function differs from that of [25] in which the 
scalability requirement reads: for all j, ahj(qi, . . . , q n ) > hj(aqi, . . . , aq n ). Changing the strict inequality to a 
loose one alters the consequences for the theorem above, where only existence is ensured. However, for our present 
purposes with cf)(s) possibly possessing a flat region, requesting a strict inequality would be too demanding. 
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Since {xi, . . . ,x n } spans C N for all large n a.s. (as a consequence of Proposition 2 in Appendix F), we can 
define for these n the functions hj, j = 1, . . . , n, 



hj(q u . . . , q n ) = ^x* | i J2 u{q l )x l x* 



(4) 



i=l 



We first show that h = (hi,..., h n ) meets the conditions of Theorem 3 for all large n a.s. Due to Al, from 
standard arguments using the Markov inequality and the Borel Cantelli lemma, we have that minj<„ ||ari|| ^ 
for all large n a.s. (this is also a corollary of Lemma 2 below). Therefore, we clearly have hj > for all j, for 
all large n a.s. Also, since u is non-increasing, taking qi, . . . , q n and q[, . . . , q* n such that q\ > qi > for all i, 
u Wi) < u (qi) an d then 

^ n 1 n 

- U (n) X i X i - U (l'i) XiX i 
i=l 

From [26, Corollary 7.7.4], this implies 

( - X] u (li) x i x i 



i=l 



l " 



,1=1 / \ 2=1 



i )XiX^ 



from which hj(q[, . . . , <^J > hj(qi, . . . , q n ), proving the monotonicity of h. 
For a > 1, <j)(aqi) > <j)(qi), so that u(aqi) > ^il. Therefore 

11 



j.X/'X.; 



2=1 2=1 

From [26, Corollary 7.7.4] again, we then have 

a ( - X u(qi)xiX* J ^ ( - u(aqi)xixl 



n ' — ' / \ n 

2=1 / \ 2=1 



so that ahj(qi, . . . ,q n ) > hj(aqi, . . . ,aq n ). Therefore ft, is a standard interference function. In order to prove 
that (4) admits a solution, from Theorem 3, we now need to prove that there exists (qi, . . . , q n ) such that for all 
j, qj > hj(qi, . . . ,q n ). Note that this may not hold for all fixed N, n as discussed in [6, pp. 54]. We will prove 
instead that a solution exists for all large n a.s. 

To pursue, we need random matrix results and additional notations. Take c_, c + such that < c_ < lirninfjv cat 
and limsupjy < c + < 1, and denote Xu\ = [xi, . . . , Xi-i, Xj+i, . . . , x n ] <G C Nx We start with the follow- 
ing fundamental lemmas, which allow for a control of the joint convergence of the quadratic forms j^x*Sjj l Xi — 1. 

Lemma 1: Assume A1-A3. There exists e > such that 

min | Ai f 

2<ri L \ n 

for all large n a.s. 

Proof: The proof is provided in Appendix B. 



- X (i) X (i) 



> £ 



Lemma 2: Assume A1-A3. Then, a.s., 



max 

2<n 



1 *A-1 
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Proof: The proof is provided in Appendix C. ■ 

Let qi = . . . = q n = q > 0. Then, 

hi( qi ,...,q n ) = —-x*S^ Xi = JL.-xtS^Xi. 

Take e > such that (1 + e)/((f) QO — e) < 1. This is always possible since ^ > 1. Choose now q such that 
4>{q) = (poo — e, which also exists since <j> is increasing on [0, _1 (0 OO — )) with image [0, 0oo)- From Lemma 2, 
for all large n a.s., 



sup 

i 

Therefore, 



^/ii(gi,...,<7„)(<£oo ~ e ) ~ 1 < r 



-/ii(?i,...,g„) < < 1 

<? 9>co - e 

from which hi(q, . . . , q) < q for all i. From Theorem 3, we therefore prove the existence of a solution to (3) with 
hj given in (4). Since these quadratic forms define the solutions of the fixed-point equation (2), this proves the 
existence of a solution Cn for all large n a.s. Note that Lemma 2 is crucial here and that, for close to one, 
there is little hope to prove existence for all fixed N, n, consistently with the results [6], [18]. 

We now prove uniqueness. Take a solution Cn and denote di = j^xlC^Xi, which we order as d\ < . . . < d n 
without loss of generality. Denote also D = diag({u(di)}™ =1 ). By definition 

dl = h*{i XDX *) ' Xi - 
From the nonincreasing property of u, we have the inequality 

XDX* > u(d n )XX* 

which implies after inversion 



— i— (xx*) 1 y (XDX*) 1 

u(d n ) 



and therefore, recalling that n 1 XX* = Sn, 



or equivalently, since u(d n ) > 0, 



Similarly, 



from which we also have 



4>(di) > ^x^Sj/x!. 
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Since <p is non-decreasing, we also have 4>(di) < <p(di) < 4>(d n ) for i < n, and we therefore obtain 

jjxlS^X! < <j>(di) < ^x* n S^x„. 
Take < e < min{l, {(poo — 1)}. From Lemma 2, for all large n a.s., 

< 1 - e < 4>{di) < 1 + e < cpoo. 

Since <p is continuous and increasing on (0, _1 ((/) oo — )) with image contained in (0, (poo), <p is invertible there 
and we obtain that for all large n a.s., 

4>- 1 {l-e)<d i < ( p- 1 {l + e). (5) 

We can now prove the almost sure uniqueness of Cn for all large n. Take e in (5) to satisfy the previous 
conditions and to be such that ((/> _1 (1 + e)) 2 /V _1 (l — e) < <p~ 1 ((poo~), which is always possible as the left-hand 
side expression is continuous in e with limit (/> _1 (1) < 4>~ 1 (4>oo — ) as e —¥ 0. 

We now follow the arguments of [25, Theorem 1 ] . Assume (d^ , . . . , d„^ ) and (d^ , . . . , d„^ ) are two distinct 
solutions of the fixed-point equation dj = hj(d\, . . . , d n ) for j = 1, . . . , n, where hj is defined by (4). Then (up to 
a change in the indices 1 and 2), there exists k such that, for some a > 1, ad^ = d^ and ad^ > d\ 2 ^ for i ^ k. 
From (5), for sufficiently large n a.s. the ratio a = dj^/dj^ is also constrained to satisfy a < (p~ 1 (l+e)/<p~ 1 (l—e). 
Using this inequality and the upper bound in (5), we have for all j 

°<°4"<^f <*-<*.->■ 

Since^is increasing on (0, (p~ 1 ((po —)), we have in particular (p(ad^) > (p(d^ ) from which au{aSp) > u(d^), 
for all j and then, with similar arguments as previously, ahj(d^\ . . . , d„ ^) > hj(ad^\ . . . , ad^) for all j. Using 
the monotonicity of h, we conclude in particular 

4 2) = h k {df\ . . . , 4 2) ) < h k (ad?\. . . , 

<ah k (d?\...,dW)=ad k 1) 
which contradicts ad k ^ = dj^ and proves the uniqueness of Cn and Part (I) of Theorem 1. 

We now prove Part (II) of the theorem. In order to proceed, we start again from (5). Since s is arbitrary, we 
conclude that 

max \di — _1 (1)| 0. 

i<n 

Applying the continuous mapping theorem, we then have 

max \u(di) — 0. 

i<n 

Noticing that = f^" 1 (1)) = 1, and therefore that = l/^^l), this can be rewritten 

1 



max 

i<n 



u(di) 



0-1(1) 



a.s. 



o. (6) 
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Now, we also have the matrix inequalities 



i<n 



min < u(di) — 



L (l) 



-XX* 



i— 1 x t \ / / 

< maxj - —^— \ -XX*. 
~ i<n \ y l > ^(l) J n 



From Proposition 2 in Appendix F, < K for some K > and for all n a.s. From (6), we then conclude 

that 

Sn 



i— 1 x 



(7 







which completes the proof of Theorem 1 . 

Proof of Corollary 1: The identity follows from [26, Theorem 4.3.7], according to which, for 1 < i < N, 



X t (S N ) < Xi (^(lJCw) + \ N (s n - ^H^Cn] 
K (s N ) > Xi (^(lJCw) - X N (s N - ^WCn) ■ 
The result follows by noticing that the second term in both right-hand sides tends to zero a.s. according to Theorem 1. 



Appendix B 
Proof of Lemma 1 

If the set of the eigenvalues of ^X^X*^ is contained within the set of the eigenvalues of ^XX*, then the 
result is immediate from Proposition 2 in Appendix F. We can therefore assume the existence of eigenvalues of 
il^X*^ which are not eigenvalues of ^XX*. By definition, the eigenvalues of ^X^X*^ solve the equation in 
A 

det (±X {i) X* (i) -XI N ^ =0. 
Take A not to be also an eigenvalue of ^XX*. Then, developing the above expression, we get 

t (i) A (i) 



del ( —X . V . . - A/\ 



= dct ( -XX* - - Xi x* - XI N 
\n n 

= dctQ(A) dct ( I N - Q(\yi-x iX *Q(Xy^ 
\ n 



= dct Q(A) (l--x*Q(Xy 

V n 

with the notation Q(X) = \XX* - XI N , where we used det(7Ar + AB) = det(7 p + BA) in the last line, for 
A e C Nxp and B E C pxN , with p = 1 here. 
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Therefore, since A cannot cancel the first determinant, 



1 



1 



Let us study the function 



x*Q(\)~ 1 x i = -x 



X 1 ^ fn,i{x) Xa 

n 



1 



-XX* - XI 



n 



N 



1. 



1 



-XX* - xl 



N 



X i . 



First note, from a basic study of the asymptotes and limits of f n .i(x), that the eigenvalues of ^X^X*^ are 
interleaved with those of ^XX* (a property known as Weyl's interlacing lemma) and in particular that 



Ai 



: x (i) x h 



< Ai -XX* < A 



n 



: x {i) x h 



(7) 



Since \\{^XX*) is a.s. away from zero for all large N (Proposition 2), only X 1 (^X^X* i ^) may remain in the 
neighborhood of zero for at least one i < n, for all large n. 

We will show that this is impossible. Precisely, for all large n a.s., we will show that f n ,i{x) < 1 for any i < n 
and for all x in some interval [0,£), £ > 0, confirming that no eigenvalue of -^X^X*^ can be found there. For 
this, we first use the fact that the f n ,i{ x ) can be uniformly well estimated for all x < through Proposition 1 in 
Appendix F by a quantity strictly less than one. We then show that the growth of the f n ,i{x) for x in a neighborhood 
of zero can be controlled, so to ensure that none of them reaches 1 for all x < £. This will conclude the proof. 

We start with the study of f n ,i(x) on R~. From Lemma 3, 



fn,i{x) 



Define 



l + ^(^W^(i)-^)" x t 
c N e N (x) 



fn{x) = 



1 + c N e N (x) 

with ejy(x) the unique positive solution of (see Proposition 1) 

t 



e N (z) 



Then, with Q{x) 4 ±XX* - xI N , Q t (x) 4 ±X {i) Xfc - xI N , 



\fn,i(x) fn{x)\ 



(1 + c N e N (z)) H — z 
- xI N , 



dF°' N (t). 



(8) 



c N e N (x) 



1 + -rX*Q i (x)- 1 x i l + c N eN{x) 



< 



< 



+ 



-x*Qi(x) y Xi - c N e N {x) 
n 

-x*<5i(x) _1 x i - - tiC N Q l {x)~ 1 
n n 

-trCNQ^x)- 1 --tYCNQ(x)- 1 
n n 



trCNQ(x) 1 - c N e N (x) 



(9) 
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Using (a + b + c) p < 3 p (a p + b p + c p ) for a, 6, c > 0, and p > 1 (Holder's inequality), and applying Lemma 5, 
Lemma 4, and Proposition 1 to the right-hand side terms of (9), respectively, with p = 4 + i]/2, we obtain 



E 



K 

< 



2+2- 



for some constant K independent of i, where we implicitly used Al. Therefore, using Boole's inequality on the 
above event for i < n, and the Markov inequality, for all ( > 0, 



P ( max\f n! i(x) - fn{x)\ > C) 



K 



<Y, P {\fnM-Ux)\ >C) < ^ , 

r— ■ r 4 +2n 1+ 4 
The Borel Cantelli lemma therefore ensures, for all x < 0, 

max |/ n ,i(or) - f n (x)\ ^> 0. (10) 

We now extend the study of f n ,i(x) to a; in a neighborhood of zero. From Proposition 2, \i(-^XX*) > C_(l — 
y/c^jr) 2 for all large n a.s. (recall that limsup^CAr < c + < 1) so that f n ,i{x) is well-defined and continuously 
differentiable on U = (—e,e) for < e < C-(l — y/c+) 2 , for all large n a.s. Take x e [/. Since the smallest 
eigenvalue of ^^-^* — aj-fiv is lower bounded by C_(l — y/c+) 2 — £ for all large n, and that 



max 



-\\ Xi f--trC N 

n n 



(using similar arguments based on the Boole and Markov inequality reasoning as above), we also have that for all 
large n a.s. 



where we used limsup^ - tr Cm < c+C + . 

From this result, along with the continuity of f n s, fovxeU and for all large n a.s., 

fn,i( X ) < fn,i{-x)+2xK'. 

In particular, for £ = min{e/2, (1 - c+)/(2K')}, 

fn,i(0 < /n,i(-0 + (1 - C+). (11) 

Since eAr(0) = 1 + CAreAf(O) by definition (15), 

/n(0) = Cjv < C+ 

and /„(x) is continuous and increasing on [/, so that 

/n(-0 < C+. 

Recalling (10), we then conclude that, for all large n a.s. 

max /„,;(-<<;) < c + 
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which, along with (11), gives, for all large n a.s. 

max /„,;(£) < 1. 

i<n 

Since f n ,i(%) is continuous and increasing on [0,£), the equation f n ,i{ x ) = 1 nas no solution on this interval 
for any i < n, for all large n a.s., which concludes the proof. 



Appendix C 
Proof of Lemma 2 

Define >Sjv,(i) = Sn — \ x i x * an d denote S^ 1 ^ its inverse when it exists or the identity matrix otherwise. Take 
2 < p < 4 + 77/2 (see Al) and e > as in Lemma 1. Denoting E Xi the expectation with respect to Xi and 

<fe - L 



L {Ai(Sjv, (l) )>e}' 



E. T 



h, X *i S N,(i) X i k tr CnS N,(i) 



1 + n X i^N,(i) X i 



1 + 



E, 



JV,(») 

£i ~ ;r tr CnS n ^^ 



< E, 



Recalling that a;j = A^j/i with t/j having independent zero mean and unit variance entries, from Lemma 5, we 
have 



E, 



< 



±tr C N S-' 



N,(i) 



1 + n X i^N,(i) Xi 



1 + HvCnSZ 1 , 



N,(i) 



iK„ 



II! 



n ...... n - 

for some constant K p depending only on p, with vi any value such that E[|j/jj|^] < vt (well defined from Al). 
Using ^ tr A k < (± trA) fe for A e C JVxAr nonnegative definite and fc > 1, with here ^ = (C JV S'^ (i) ) 2 , k = p/2, 
this gives 



E. T 



zrX*S AJ/ ^Xi ^ N N,(i) 



n^i u N,(i) Xl 



1 + n X i^N,(i) X i 



1 + ^tCnS- 1 , 



N,(i) 



<^(-!+- 2 p) (±tr(C^ (i) ) 



712 \ / 



(12) 



where, in (12), we used tr AB < \\A\\ tr B for A, B h 0, < 1, H-S^H < e" 1 when <p t = 1, and ^trC% < 
c + C\. 

This being valid irrespective of X^, we can take the expectation of the above expression over to obtain 



E 



n X i^N,(i) X i 



n tr CnS N,(i) 



1 + ±a*S 



n JJ i'~'N,{i) X i 



1 + itrC^" 1 



JV,(i) 



< 



712 
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Therefore, from Lemma 3, 



E 



» 1 N l ~^i*c N s-] (i) 



< 



U2 



Using Boole's inequality on the n events above with i = 1, . . . , n, and Markov inequality, for ( > 0, 



P max ■ 

\ i<n 



1 „ ~ tr CmS.,^, s 



_|_ * 6—1 ™ 



l + ItrC^ (i) 



< 



£-1 ' 



Choosing 4 < p < 4 + 77/2, the right-hand side is summable. The Borel-Cantelli lemma then ensures that 



max • 

i<n 



1 * ~ tr Cn£»at^/ \ 

-L * A_1 n Jv jv,(z) 



1 + itrCWS" 1 , 



JV,(i) 



0. 



But, from Lemma 1, minify} = 1 for all large n a.s. Therefore, we conclude 

1 



max ■ 

i<n 



Tl 



Xi S ' jy Xi 



1 + ±trC N S- 1 ,., 



(13) 



Since Sn,(%) — cIn >~ for these large n, we also have 



max 

i<n 



— tr Cm S Ar / x 



±tr CatS^ 1 



max 

i<n 



i tr CwS'jv 1 - i tr C N S N {i 



trCWS"^) (l + ±trCWS 



< ic t 

n e 



where, in the last inequality, we used Lemma 4 with £? = Cat, ^4 = Sjv,(i) — £-^v and x = s, along with the fact 
that (1 + x)- 1 < 1 for x > 0. 

From Proposition 1, since Xi(Sn) > Ai(<Sjv,(i)) > £ for these large n (see (7)), we also have 



cn 



n I — cn 



and thus, from (1 — c^) 1 /(1 + cjv(1 — cat) : ) = cjv, 



i tr CnSj^ 1 



1 + itrCjvV 



Cat 



0. 



Putting things together, this finally gives 



max • 

i<n 



~ X i SfJ~Xi Cat 



n 



an expression which, since > c_ > for all large /V, can be divided by cat, concluding the proof. 
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Appendix D 
Proof of Theorem 3 

The proof immediately follows from the arguments of [25]. When the scalability assumption is satisfied with strict 
inequality, the result is exactly [25, Theorem 2]. When the scalability assumption is reduced to a loose inequality, 
[25, Theorem 1] does not hold, and therefore uniqueness cannot be satisfied. Nonetheless, the existence of a solution 
follows from the proof of [25, Lemma 1] which does not call for the scalability assumption. Indeed, since there 
exists (<7i, . . . , q n ) such that q, t > h(q\, . . . , q n ) for all i, the algorithm 

with q^ = qj, satisfies q^ < qf 1 for all j. Assuming qf +1 ^ < qf^ for all j, the monotonicity assumption ensures 
that qf +2 ^ < qf +r> which, by recursion, means that q-p is a non-increasing sequence. Now, since q^ is in the 
image of hj, q^p > by positivity, and therefore q^p converges to a fixed-point (not necessarily unique). Such a 
fixed-point therefore exists. Note that [25, Lemma 2] provides an algorithm for reaching this fixed-point, starting 
with ^ 0) = for all j. 

Appendix E 
Proof of Theorem 2 

If Cn is replaced by Sn in the statement of the result, then Theorem 2 is exactly [19, Theorem 2], which is a 
direct consequence of [13, Theorem 3] with some updated remarks on the /ij found in the discussion around [21, 
Theorem 17.1]. In order to prove Theorem 2, we need to justify the substitution of Sn by Cn. First observe that 
the result is independent of a scaling of Sn, and therefore we can freely substitute Sn by </> _1 (l)Cjv instead of 
Cn- Using the notations of Mestre in [13], we first need to extend [13, Proposition 4]. Call g2i( z ) the equivalent 
of (Jm{z) designed from the eigenvectors of _1 (l)Cjv instead of those of Sn (referred to as R M in [13] with M 
in place of N, and N in place of n). Then, on the chosen rectangular contour dR~(m), both g^{ z ) and (jm{z) 
are a.s. bounded holomorphic functions for all large N; this is due to the exact separation [14, Theorem 3] of the 
eigenvalues of S n and the fact that Corollary 1 ensures the convergence between the eigenvalues of (f)~ 1 (l)CN 
and of Sn- 

From [13, Equation (29)], <?m( z ) consists of the functions b M (z) and rh M (z) for which we also call bff(z) 
and rh < l I (z) their equivalents for 0~ 1 (1)C , at. We need to show that the respective differences of these functions 
go to zero. From the definition [13, Equation (4)] of 1>m(z), Theorem 1 and the fact that \ j^tr(A^ 1 — -B _1 )| < 
\\A- 1 \\\\B- 1 \\\\A- B\\ for invertible A,B e C NxN , we have immediately that 

b M (z) - 1>m(z) 



sup 

zSORy (m) 



Similarly, using [13, Equation (6)], and \a* (A' 1 - B' 1 )^ < |a*&|||>r 1 ||||.B- 1 ||||A - B|| for a, b e C N , we find 

sup \rh M {z) - m^(^)| 0. 

zGdRy(m) 
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By the dominated convergence theorem, this gives 

(9m( z ) - 9m(z)) dz 



IdRy(m) 

which then immediately extends [13, Proposition 4] to the present scenario. The second step to be proved is that 
the residue calculus performed in [13, Equations (32)-(33)] carries over to the present scenario. The poles within 
the contour dM.~ (m) are the and the found in the contour. The indices k such that the and fik are within 
dM.y(m) are the same for Sn and 4>~ 1 (1)Cn for all large N, due to the exact separation property and Corollary 1. 
This completes the proof. 

Appendix F 
Useful lemmas and results 

Lemma 3 (A matrix-inversion lemma): Let x G C N , A G C NxN , and fgt. Then, whenever the inverses exist 

x* (A + txx*)- 1 x = x*A~ 1 x(l + tx*A~ 1 x)- 1 . 
Lemma 4 (Rank-one perturbation): Let v G C^, A,B G C WxW nonnegative definite, and x > 0. Then 

tr B (A + vv* + xIn)' 1 - tr B {A + xIn)- 1 < aj _1 ||S||. 

Lemma 5 (Trace lemma): [27, Lemma B.26] Let A G C NxN be non-random and y — [yi, . . . , y N ] T G C N be a 
vector of independent entries with E[j/i] = 0, E[|j/i| 2 ] = 1, and E[|j/i| £ ] < U( for all i < 2p, with p > 2. Then, 

E [\y*Ay - tr A\ p ] < C p ((z/ 4 tr AA*)i + v 2p tx(AA*)i') 

for C p a constant depending on p only. 

Proposition 1 (A random matrix result): Let X — [xi, . . . ,x n ] G C Nxn with = Awyi, An G C ArxM , M > 
N, where yi = [yn, ■ . ■ ,yna] G C M has independent entries satisfying E[j/jj] = 0, E[|?/ij| 2 ] = 1, E[|yy| ] < vg 
for all £ < 2p and Cat = AatA^ is nonnegative definite with ||CV|| < C+ < oo. Assume cjy = N/n and 
c^r = M/N > 1 satisfy limsupjy < oo and limsupjy c^v < oo, as N,n,M^ oo. Then, for z < 0, and p > 2, 



< T7F (14) 

A 2 



E 

for ifp a constant depending only on p, v t for I < 2p, and z, while e]v(z) is the unique positive solution of 

e N {z) = [ -— t --- T - dF c -(t) (15) 

J (1 + c N e N (z)) H-z 

where F Cn is the eigenvalue distribution of Cn- The function R~ — > M+, z eAr(z) is increasing. 

Moreover, for any N , as N, n — > oo with limsupjy cat < oo, for z G M \ Sjv , where Sat is the union of the 
supports of the eigenvalue distributions of ^XX* for all N > N a , 

1 „ (\ 



-tvC N \^-XX*-zI N j -e N (z)^0. (16) 

Proof: To prove the first part of Proposition 1, we follow the steps of the proof of [28]. Note first that we can 
append An into an M x M matrix by adding rows of zeros, without altering the left-hand side of (14). Using the 
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notations of [28], we consider the simple case where A n = and cr™ = C™, where C" denotes the i-th eigenvalue 
of Cat. Although this updated proof of [28] would impose Cn to be diagonal, it is rather easy to generalize to 
non-diagonal Cn (see e.g. [29], [30]). The proof then extends to the non i.i.d. case when using Lemma 5 instead 
of [28, (B.l)]. The second part follows from the first part immediately for z < 0. In order to extend the result to 
z£l \ §n , note that both left-hand side terms in (16) are uniformly bounded in any compact D away from §n 
and including part of Mr , and are holomorphic on D. From Vitali's convergence theorem [31], their difference 
therefore tends to zero on CD, which is what we need. ■ 
Proposition 2 (No eigenvalue outside the support): Let X = [x\ , . . . , x n ] <G C Nxn with Xi — AnVi, An e 
C NxM , where yi = [yn , . . . , y iM ] € C M has independent entries satisfying Efy^] = 0, E[|y y | 2 ] = 1 and 
E[|yij| 4+,? ] < a for some ri,a > 0, Cn — AnA* n has bounded spectral norm, and N,n,M — »■ oo with 
limsup^Y N/n < 1, and 1 < limsup^M/iV < oo. Let iV be an integer and [a, b] C M U {±oo}, b > a, a 
segment outside the closure of the union of the supports F N / n > c N^ jy > w j m pt,A me limiting support of the 
eigenvalues of \XX* when Cn has the same spectrum as A for all N and N/n -4- t. Then, for all large n a.s., 
no eigenvalue of ^XX* is found in [a, b]. 

Proof: Appending An into anMxM matrix filled with zeros, this unfolds from [14, Theorem 3] (for which 
conditions l)-3) are met), with the supports F n / u ' Cn appended with the singleton {0}. Now, for A N € C ArxM , such 
that AnA* n is positive definite, zero is not an eigenvalue of \XX* for all N, a.s., which gives the result. Condition 
1) of [14, Theorem 3] holds here by definition. Condition 3) is obtained by taking ip(x) = x 2+v . Condition 2) is 
obtained by taking z a random variable with Pareto distribution P(z < x) = (1 — a p ~ 1 x 1 ~ p )l x > a for p = 5 + rj 
and a = aJ+^\ by Markov inequality, 
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This z has finite 4 + ry order moment, which therefore enforces Condition 2). 
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